Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns
نویسندگان
چکیده
Wait states in parallel applications can be identified by scanning event traces for characteristic patterns. In our earlier work we defined such inefficiency patterns for MPI-2 one-sided communication, although still based on a serial traceanalysis scheme with limited scalability. In this article we show how wait states in one-sided communications can be detected in a more scalable fashion by taking advantage of a new scalable trace-analysis approach based on a parallel replay, which was originally developed for MPI-1 point-to-point and collective communication. Moreover, we demonstrate the scalability of our method and its usefulness for the optimization cycle with applications running on up to 32,768 cores.
منابع مشابه
Specification of Inefficiency Patterns for MPI-2 One-Sided Communication
Automatic performance analysis of parallel programs can be accomplished by scanning event traces of program execution for patterns representing inefficient behavior. The temporal and spatial relationships between individual runtime events recorded in the event trace allow the recognition of wait states as a result of suboptimal parallel interaction. In our earlier work [1], we have shown how pa...
متن کاملUsing InfiniBand for a scalable compute infrastructure
............................................................................................................................................. 2 Introduction ......................................................................................................................................... 2 InfiniBand technology .................................................................................
متن کاملGravel: A Communication Library to Fast Path MPI
Remote Direct Memory Access (RDMA) technology allows data to move from the memory of one system into another system’s memory without involving either one’s CPU. This capability enables communication-computation overlapping, which is highly desirable for addressing the costly communication overhead in cluster computing. This paper describes the consumer-initiated and producer-initiated protocols...
متن کاملPerformance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes For the message-passing implementation we use MPI point-to-point and global communication routines. For th...
متن کاملDesign and Implementation of Open MPI over QsNet/Elan4
Open MPI is a project recently initiated to provide a fault-tolerant, multi-network capable, and productionquality implementation of MPI-2 [20] interface based on experiences gained from FT-MPI [8], LA-MPI [10], LAM/MPI [28], and MVAPICH [23] projects. Its initial communication architecture is layered on top of TCP/IP. In this paper, we have designed and implemented Open MPI point-to-point laye...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJHPCA
دوره 26 شماره
صفحات -
تاریخ انتشار 2009